A Comparative Analysis of Extracted Grammars
نویسندگان
چکیده
The development of wide-coverage grammars is at the core of robust NLP systems. This paper addresses the problem of grammar extraction from treebanks with respect to the issue of broad coverage along three dimensions: the grammar formalism (contextfree grammar, dependency grammar, lexicalized tree adjoining grammar), the domain of the annotated corpus (press reports, civil law) and the language of the corpus (English, Korean, Chinese, Italian). We have extracted three grammars from an annotated corpus of Italian and we have comparatively analyzed the coverage of a test set; then, working on two different domain subcorpora we have compared the cross-domain coverage of the extracted grammars; finally, we have compared the grammars for four different languages. The results are that there are relevant differences in coverage among formalisms and domains; a more limited difference appears in the crosslinguistic comparison.
منابع مشابه
Extraction of Tree Adjoining Grammars from a Treebank for Korean
We present the implementation of a system which extracts not only lexicalized grammars but also feature-based lexicalized grammars from Korean Sejong Treebank. We report on some practical experiments where we extract TAG grammars and tree schemata. Above all, full-scale syntactic tags and well-formed morphological analysis in Sejong Treebank allow us to extract syntactic features. In addition, ...
متن کاملComparing and integrating Tree Adjoining Grammars
Grammars are core elements of many NLP applications. Grammars can be developed in two ways: built by hand or extracted from corpora. In this paper, we compare a handcrajted grammar with a Treebank grammar. We contend that recognizing substructures of the grammars' basic units is necessary tures and semantic information which are rarely represented in the corpora. lt would be ideal if we could c...
متن کاملHierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities
We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignment model. We define translation grammars progressi...
متن کاملAlternating Regular Tree Grammars in the Framework of Lattice-Valued Logic
In this paper, two different ways of introducing alternation for lattice-valued (referred to as {L}valued) regular tree grammars and {L}valued top-down tree automata are compared. One is the way which defines the alternating regular tree grammar, i.e., alternation is governed by the non-terminals of the grammar and the other is the way which combines state with alternation. The first way is ta...
متن کاملA Comparative Appraisal of Roadway Accident for Asia-Pacific Countries
This paper describes an attempt to shed some light on road safety in Asia Pacific region by characterizing and assessing its road accidents. The relevant national road accident data were extracted from centralized data sources of international agencies. Due to data incompleteness and missing values, 21 Asia Pacific countries, presenting more than half of the world’s population, were selected fo...
متن کامل